In this lab, you will practice methods for dealing with missing data when using lavaan to fit latent variable models.


1 Preliminaries


1.1 Data


For this lab, we will go back to working with the Eating Attitudes data from Lab 4. These data are available as eating_attitudes.rds. Unlike the completed version that we analyzed in Lab 4, we will now work with the incomplete version that is actually analyzed in Enders (2010).

As before, this dataset includes 400 observations of the following 14 variables. Note that the variables are listed in the order that they appear on the dataset.

  • id: A numeric ID
  • eat1:eat24: Seven indicators of a Drive for Thinness construct
  • eat3:eat21: Three indicators of a Preoccupation with Food construct
  • bmi: Body mass index
  • wsb: A single item assessing Western Standards of Beauty
  • anx: A single item assessing Anxiety Level

You can download the original data here, and you can access the code used to process the data here.


1.1.1

Read in the eating_attitudes.rds dataset.

dataDir <- "../../data/"
ea      <- readRDS(paste0(dataDir, "eating_attitudes.rds"))

NOTE:

  1. In the following, I will refer to these data as the EA data.
  2. Unless otherwise specified, the data analyzed in all following questions are the EA data.

1.1.2

Summarize the EA data to get a sense of their characteristics.

  • Pay attention to the missing values.
head(ea)
summary(ea)
       id             eat1            eat2           eat10      
 Min.   :  1.0   Min.   :2.000   Min.   :1.000   Min.   :1.000  
 1st Qu.:100.8   1st Qu.:3.000   1st Qu.:3.000   1st Qu.:3.000  
 Median :200.5   Median :4.000   Median :4.000   Median :4.000  
 Mean   :200.5   Mean   :3.987   Mean   :3.939   Mean   :3.938  
 3rd Qu.:300.2   3rd Qu.:5.000   3rd Qu.:4.000   3rd Qu.:4.000  
 Max.   :400.0   Max.   :7.000   Max.   :7.000   Max.   :7.000  
                 NA's   :27      NA's   :21      NA's   :30     
     eat11           eat12           eat14           eat24      
 Min.   :1.000   Min.   :2.000   Min.   :1.000   Min.   :1.000  
 1st Qu.:3.000   1st Qu.:3.000   1st Qu.:3.000   1st Qu.:3.000  
 Median :4.000   Median :4.000   Median :4.000   Median :4.000  
 Mean   :3.938   Mean   :3.905   Mean   :3.962   Mean   :3.954  
 3rd Qu.:4.000   3rd Qu.:4.000   3rd Qu.:4.000   3rd Qu.:4.000  
 Max.   :7.000   Max.   :7.000   Max.   :7.000   Max.   :7.000  
                 NA's   :33                      NA's   :33     
      eat3           eat18           eat21            bmi       
 Min.   :1.000   Min.   :2.000   Min.   :1.000   Min.   :15.23  
 1st Qu.:3.000   1st Qu.:3.000   1st Qu.:3.000   1st Qu.:20.43  
 Median :4.000   Median :4.000   Median :4.000   Median :22.42  
 Mean   :3.967   Mean   :3.951   Mean   :3.953   Mean   :22.39  
 3rd Qu.:4.000   3rd Qu.:4.000   3rd Qu.:4.000   3rd Qu.:24.21  
 Max.   :7.000   Max.   :7.000   Max.   :7.000   Max.   :31.52  
                 NA's   :32      NA's   :19      NA's   :5      
      wsb              anx       
 Min.   : 4.000   Min.   : 4.00  
 1st Qu.: 8.000   1st Qu.:10.00  
 Median : 9.000   Median :12.00  
 Mean   : 8.946   Mean   :11.94  
 3rd Qu.:10.000   3rd Qu.:14.00  
 Max.   :14.000   Max.   :20.00  
 NA's   :27       NA's   :18     
str(ea)
'data.frame':   400 obs. of  14 variables:
 $ id   : num  1 2 3 4 5 6 7 8 9 10 ...
 $ eat1 : num  4 6 3 3 3 4 5 4 4 6 ...
 $ eat2 : num  4 5 3 3 2 5 4 3 7 5 ...
 $ eat10: num  4 6 2 4 3 4 4 4 6 5 ...
 $ eat11: num  4 6 2 3 3 5 4 4 5 5 ...
 $ eat12: num  4 6 3 4 3 4 4 4 4 6 ...
 $ eat14: num  4 7 2 4 3 4 4 4 6 6 ...
 $ eat24: num  3 6 3 3 3 4 4 4 4 5 ...
 $ eat3 : num  4 5 3 3 4 4 3 6 4 5 ...
 $ eat18: num  5 6 3 5 4 5 3 6 4 6 ...
 $ eat21: num  4 5 2 4 4 4 3 5 4 5 ...
 $ bmi  : num  18.9 26 18.3 18.2 24.4 ...
 $ wsb  : num  9 13 6 5 10 7 11 8 10 12 ...
 $ anx  : num  11 19 8 14 7 11 12 12 14 12 ...

1.2 Missing Data Screening


Before we can make any headway with missing data treatment or substantive analyses, we need to get a handle on the extent of our missing data problem. So, we will begin with some descriptive analysis of the missing data.


1.2.1

Calculate the proportion of missing data for each variable.

library(dplyr)

## Calculate variablewise proportions missing:
ea %>% is.na() %>% colMeans()
    id   eat1   eat2  eat10  eat11  eat12  eat14  eat24   eat3  eat18  eat21 
0.0000 0.0675 0.0525 0.0750 0.0000 0.0825 0.0000 0.0825 0.0000 0.0800 0.0475 
   bmi    wsb    anx 
0.0125 0.0675 0.0450 

1.2.2

Calculate the covariance coverage rates.

  • You can use the md.pairs() function from the mice package to count the number of jointly observed cases for every pair or variables.
library(mice)     # Provides md.pairs()
library(magrittr) # Provides the extract2() alias function

## Calculate covariance coverage:
cc <- (ea %>% select(-id) %>% md.pairs() %>% extract2("rr")) /  nrow(ea)
cc %>% round(2)
      eat1 eat2 eat10 eat11 eat12 eat14 eat24 eat3 eat18 eat21  bmi  wsb  anx
eat1  0.93 0.89  0.87  0.93  0.86  0.93  0.86 0.93  0.86  0.89 0.92 0.88 0.90
eat2  0.89 0.95  0.87  0.95  0.87  0.95  0.88 0.95  0.88  0.90 0.94 0.88 0.90
eat10 0.87 0.87  0.92  0.92  0.86  0.92  0.86 0.92  0.86  0.88 0.92 0.86 0.88
eat11 0.93 0.95  0.92  1.00  0.92  1.00  0.92 1.00  0.92  0.95 0.99 0.93 0.96
eat12 0.86 0.87  0.86  0.92  0.92  0.92  0.85 0.92  0.86  0.87 0.91 0.87 0.87
eat14 0.93 0.95  0.92  1.00  0.92  1.00  0.92 1.00  0.92  0.95 0.99 0.93 0.96
eat24 0.86 0.88  0.86  0.92  0.85  0.92  0.92 0.92  0.86  0.88 0.90 0.85 0.88
eat3  0.93 0.95  0.92  1.00  0.92  1.00  0.92 1.00  0.92  0.95 0.99 0.93 0.96
eat18 0.86 0.88  0.86  0.92  0.86  0.92  0.86 0.92  0.92  0.88 0.91 0.86 0.88
eat21 0.89 0.90  0.88  0.95  0.87  0.95  0.88 0.95  0.88  0.95 0.94 0.89 0.91
bmi   0.92 0.94  0.92  0.99  0.91  0.99  0.90 0.99  0.91  0.94 0.99 0.92 0.94
wsb   0.88 0.88  0.86  0.93  0.87  0.93  0.85 0.93  0.86  0.89 0.92 0.93 0.89
anx   0.90 0.90  0.88  0.96  0.87  0.96  0.88 0.96  0.88  0.91 0.94 0.89 0.96

1.2.3

Summarize the coverages from 1.2.2.

Covariance coverage matrices are often very large and, hence, difficult to parse. It can be useful to distill this information into a few succinct summaries to help extract the useful knowledge.

One of the most useful numeric summaries is the range. We’ll start there.

NOTE:

  • When computing the range, it is often helpful to exclude coverages of 1.0 since variables that are fully jointly observed won’t have much direct influence on our missing data treatment.
  • We usually want to exclude the diagonal elements from the coverage matrix, too. These values represent variance coverages instead of covariance coverages. In other words, the diagonal of the coverage matrix gives the proportion of observed cases for each variable.
## Range of coverages < 1.0:
## NOTE: Use lower.tri() to select only the elements below the diagonal.
uniqueCc <- cc[lower.tri(cc)] 
uniqueCc[uniqueCc < 1] %>% range()
[1] 0.8500 0.9875

Sometimes, we may want to count the number of coverages that satisfy some condition (e.g., fall below some threshold). When doing such a summary, we need to remember two idiosyncrasies of the covariance coverage matrix:

  1. The diagonal elements are not covariance coverages.
  2. The matrix is symmetric.

So, to get a count of covariance coverages without double counting, we consider only the elements from the lower (or upper) triangle.

## Count the number of coverages lower than 0.9:
sum(uniqueCc < 0.9)
[1] 32

1.2.4

Visualize the covariance coverage rates from 1.2.2.

As with numeric summaries, visualizations are also a good way to distill meaningful knowledge from the raw information in a covariance coverage matrix.

We’ll try two different visualizations.

First, we’ll create a simple histogram of the coverages to get a sense of their distribution. To see why such a visualization is useful, consider two hypothetical situations wherein the range of coverages is [0.2; 0.8].

  1. Half of these coverages are lower than 0.3
  2. Only one of the coverages is lower than 0.3

Clearly, the second situation is less problematic, but we cannot differentiate between these two simply by examining the range of coverages.

library(ggplot2)

## Simple histogram of coverages:
data.frame(Coverage = uniqueCc) %>% 
  ggplot(aes(Coverage)) + 
  geom_histogram(bins = 15, color = "black")

Next, we’ll create a heatmap of the coverage matrix itself. Such a visualization could help us locate clusters of variables with especially low coverages, for example.

## Convert the coverage matrix into a plotable, tidy-formatted data.frame:
pDat <- data.frame(Coverage = as.numeric(cc), 
                   x        = rep(colnames(cc), ncol(cc)), 
                   y        = rep(colnames(cc), each = ncol(cc))
                   )

## Create the heatmap via the "tile" geom:
ggplot(pDat, aes(x = x, y = y, fill = Coverage)) + 
  geom_tile() + 
  scale_x_discrete(limits = rev(colnames(cc))) +
  scale_y_discrete(limits = colnames(cc)) + 
  scale_fill_distiller(palette = "Oranges") + 
  xlab(NULL) + 
  ylab(NULL)


1.2.5

Visualize the missing data patterns.

  • How many unique response patterns are represented in the EA data?

HINT:

  • The plot_pattern() function from ggmice will create a nice visualization of the patterns.
  • The md.pattern() function from mice will create a (somewhat less beautiful) visualization but will return a numeric pattern matrix that you can further analyze.
library(ggmice)

## Visualize the response patterns:
plot_pattern(ea)

## Create an analyzable pattern matrix (without visualization):
(pats <- md.pattern(ea, plot = FALSE))
    id eat11 eat14 eat3 bmi anx eat21 eat2 eat1 wsb eat10 eat18 eat12 eat24    
242  1     1     1    1   1   1     1    1    1   1     1     1     1     1   0
12   1     1     1    1   1   1     1    1    1   1     1     1     1     0   1
10   1     1     1    1   1   1     1    1    1   1     1     1     0     1   1
2    1     1     1    1   1   1     1    1    1   1     1     1     0     0   2
8    1     1     1    1   1   1     1    1    1   1     1     0     1     1   1
3    1     1     1    1   1   1     1    1    1   1     1     0     1     0   2
3    1     1     1    1   1   1     1    1    1   1     1     0     0     1   2
2    1     1     1    1   1   1     1    1    1   1     1     0     0     0   3
14   1     1     1    1   1   1     1    1    1   1     0     1     1     1   1
2    1     1     1    1   1   1     1    1    1   1     0     1     1     0   2
1    1     1     1    1   1   1     1    1    1   1     0     1     0     1   2
2    1     1     1    1   1   1     1    1    1   1     0     0     1     1   2
2    1     1     1    1   1   1     1    1    1   1     0     0     0     1   3
12   1     1     1    1   1   1     1    1    1   0     1     1     1     1   1
3    1     1     1    1   1   1     1    1    1   0     1     1     0     1   2
3    1     1     1    1   1   1     1    1    1   0     1     0     0     1   3
1    1     1     1    1   1   1     1    1    1   0     0     1     0     1   3
5    1     1     1    1   1   1     1    1    0   1     1     1     1     1   1
1    1     1     1    1   1   1     1    1    0   1     1     1     1     0   2
2    1     1     1    1   1   1     1    1    0   1     1     1     0     1   2
2    1     1     1    1   1   1     1    1    0   1     1     1     0     0   3
2    1     1     1    1   1   1     1    1    0   1     0     1     1     0   3
3    1     1     1    1   1   1     1    1    0   0     1     1     1     1   2
2    1     1     1    1   1   1     1    1    0   0     0     1     1     1   3
9    1     1     1    1   1   1     1    0    1   1     1     1     1     1   1
1    1     1     1    1   1   1     1    0    1   1     1     1     1     0   2
1    1     1     1    1   1   1     1    0    1   1     1     1     0     1   2
2    1     1     1    1   1   1     1    0    1   1     1     0     1     1   2
3    1     1     1    1   1   1     1    0    1   1     1     0     1     0   3
1    1     1     1    1   1   1     1    0    1   0     1     1     1     1   2
3    1     1     1    1   1   1     1    0    0   1     1     1     1     1   2
13   1     1     1    1   1   1     0    1    1   1     1     1     1     1   1
1    1     1     1    1   1   1     0    1    1   1     1     1     1     0   2
1    1     1     1    1   1   1     0    1    1   1     0     1     1     0   3
1    1     1     1    1   1   1     0    1    1   1     0     0     1     1   3
1    1     1     1    1   1   1     0    1    1   0     1     0     1     1   3
1    1     1     1    1   1   1     0    1    0   0     1     0     1     1   4
10   1     1     1    1   1   0     1    1    1   1     1     1     1     1   1
1    1     1     1    1   1   0     1    1    1   1     1     0     1     1   2
4    1     1     1    1   1   0     1    1    0   1     1     1     1     1   2
1    1     1     1    1   1   0     1    1    0   1     1     1     1     0   3
1    1     1     1    1   1   0     1    0    1   1     1     1     1     1   2
1    1     1     1    1   1   0     0    1    1   1     1     1     1     1   2
3    1     1     1    1   0   1     1    1    1   1     1     1     1     1   1
1    1     1     1    1   0   1     1    1    1   1     0     1     0     1   3
1    1     1     1    1   0   1     1    1    0   1     0     1     1     1   3
     0     0     0    0   5  18    19   21   27  27    30    32    33    33 245
## Count the number of unique response patterns:
nrow(pats) - 1
[1] 46

As shown by the above code, there are 46 unique response patterns in the EA data.


1.2.6

Calculate the fraction of missing information for the summary statistics.

The fraction of missing information (FMI) is a crucial statistic for quantifying the extent to which missing data (and their treatment) affect parameter estimates. We generally calculate the FMI for the estimates of our substantive model parameters, but we can also use the FMI as a descriptive tool before fitting any substantive model.

  • The FMI for the sufficient statistics (e.g., means, variance, and covariances) of our data can give us a good idea of what level of FMI to expect for models fit to those data.
  • With the fmi() function from semTools, we can efficiently compute the FMI for sufficient statistics using the FIML-based approach described by Savalei and Rhemtulla (2012).
library(semTools)

fmi <- ea %>% select(-id) %>% fmi()

fmi$Covariances$fmi
      eat1  eat2  eat10 eat11 eat12 eat14 eat24 eat3  eat18 eat21 bmi   wsb   anx  
eat1  0.063                                                                        
eat2  0.067 0.030                                                                  
eat10 0.073 0.032 0.043                                                            
eat11 0.029 0.014 0.023 0.000                                                      
eat12 0.090 0.056 0.072 0.061 0.080                                                
eat14 0.031 0.008 0.019 0.000 0.040 0.000                                          
eat24 0.096 0.088 0.064 0.062 0.071 0.055 0.087                                    
eat3  0.050 0.018 0.025 0.000 0.075 0.000 0.077 0.000                              
eat18 0.101 0.068 0.065 0.049 0.074 0.041 0.116 0.037 0.071                        
eat21 0.052 0.044 0.048 0.023 0.080 0.016 0.091 0.017 0.062 0.044                  
bmi   0.059 0.032 0.035 0.006 0.058 0.004 0.090 0.010 0.063 0.033 0.011            
wsb   0.083 0.096 0.097 0.035 0.127 0.061 0.093 0.030 0.091 0.078 0.064 0.067      
anx   0.076 0.050 0.039 0.025 0.089 0.037 0.085 0.042 0.084 0.065 0.039 0.093 0.056
fmi$Means %>% select(variable, fmi) %>% print(digits = 3)
    variable   fmi
92      eat1 0.038
93      eat2 0.030
94     eat10 0.032
95     eat11 0.000
96     eat12 0.052
97     eat14 0.000
98     eat24 0.053
99      eat3 0.000
100    eat18 0.046
101    eat21 0.023
102      bmi 0.010
103      wsb 0.060
104      anx 0.028

1.3 Naive Analysis


Of course, we can estimate models in lavaan without doing anything fancy to the missing data. If we fit a model to incomplete data using lavaan, the software will automatically apply listwise deletion to remove any missing data before estimating the model.

Although you should almost never use deletion-based treatments, we’ll start our modeling exercises by fitting a model using the default complete case analysis. Sometimes, something goes wrong with the modeling, and you end up deleting cases accidentally. If you’re keyed into the signs that listwise deletion has occurred, you’ll know what to check when working with these methods in the wild.


1.3.1

Define the model syntax for the CFA.

The data only contain multi-item scales for Drive for Thinness and Preoccupation with Food, so we only need a two-dimensional CFA evaluating the measurement structure of these two factors.

  • Indicated the Drive for Thinness factor from the seven relevant scale items.
    • eat1, eat2, eat10, eat11, eat12, eat14, eat24
  • Indicated the Preoccupation with Food factor from the remaining three scale items.
    • eat3, eat18, eat21
cfaMod <- '
drive =~ eat1 + eat2  + eat10 + eat11 + eat12 + eat14 + eat24
pre   =~ eat3 + eat18 + eat21
'

1.3.2

Estimate the CFA model on the EA data.

  • Correlate the latent factors.
  • Set the scale by standardizing the latent factors.
  • Estimate the mean structure.
  • Use complete case analysis to treat the missing data.
naiveCfa <- cfa(cfaMod, data = ea, std.lv = TRUE, meanstructure = TRUE)

1.3.3

Summarize the fitted CFA and check the model fit.

  • Do the parameter estimates look sensible?
  • Does the model fit the data well enough?
  • How many observations were deleted?
summary(naiveCfa)
lavaan 0.6-12 ended normally after 25 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        31

                                                  Used       Total
  Number of observations                           267         400

Model Test User Model:
                                                      
  Test statistic                                57.511
  Degrees of freedom                                34
  P-value (Chi-square)                           0.007

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)
  drive =~                                            
    eat1              0.718    0.057   12.524    0.000
    eat2              0.661    0.053   12.391    0.000
    eat10             0.814    0.051   15.913    0.000
    eat11             0.751    0.047   15.975    0.000
    eat12             0.668    0.055   12.207    0.000
    eat14             0.943    0.050   18.852    0.000
    eat24             0.628    0.055   11.462    0.000
  pre =~                                              
    eat3              0.763    0.053   14.368    0.000
    eat18             0.698    0.055   12.659    0.000
    eat21             0.879    0.052   16.829    0.000

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)
  drive ~~                                            
    pre               0.647    0.044   14.824    0.000

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)
   .eat1              3.884    0.064   60.958    0.000
   .eat2              3.831    0.059   64.776    0.000
   .eat10             3.884    0.061   63.654    0.000
   .eat11             3.835    0.056   68.305    0.000
   .eat12             3.888    0.060   64.296    0.000
   .eat14             3.880    0.064   61.029    0.000
   .eat24             3.884    0.060   65.144    0.000
   .eat3              3.884    0.059   65.403    0.000
   .eat18             3.895    0.060   65.119    0.000
   .eat21             3.869    0.061   63.527    0.000
    drive             0.000                           
    pre               0.000                           

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .eat1              0.568    0.053   10.710    0.000
   .eat2              0.497    0.046   10.736    0.000
   .eat10             0.332    0.034    9.657    0.000
   .eat11             0.278    0.029    9.626    0.000
   .eat12             0.530    0.049   10.770    0.000
   .eat14             0.191    0.027    7.164    0.000
   .eat24             0.555    0.051   10.896    0.000
   .eat3              0.359    0.043    8.307    0.000
   .eat18             0.468    0.049    9.615    0.000
   .eat21             0.217    0.042    5.110    0.000
    drive             1.000                           
    pre               1.000                           
fitMeasures(naiveCfa)
               npar                fmin               chisq                  df 
             31.000               0.108              57.511              34.000 
             pvalue      baseline.chisq         baseline.df     baseline.pvalue 
              0.007            1504.751              45.000               0.000 
                cfi                 tli                nnfi                 rfi 
              0.984               0.979               0.979               0.949 
                nfi                pnfi                 ifi                 rni 
              0.962               0.727               0.984               0.984 
               logl   unrestricted.logl                 aic                 bic 
          -3027.345           -2998.589            6116.690            6227.895 
             ntotal                bic2               rmsea      rmsea.ci.lower 
            267.000            6129.606               0.051               0.027 
     rmsea.ci.upper        rmsea.pvalue                 rmr          rmr_nomean 
              0.073               0.447               0.036               0.039 
               srmr        srmr_bentler srmr_bentler_nomean                crmr 
              0.037               0.037               0.040               0.040 
        crmr_nomean          srmr_mplus   srmr_mplus_nomean               cn_05 
              0.045               0.037               0.040             226.640 
              cn_01                 gfi                agfi                pgfi 
            261.267               0.995               0.990               0.520 
                mfi                ecvi 
              0.957               0.448 

This model looks fine. All measurement model parameters seem reasonable, and the model fits the data well. That being said, 133 observations were deleted to “fix” the missing data. Also, we can be reasonably certain that these estimates are biased, unless the data were all missing completely at random (MCAR). Even though the estimates look “good”, they can still be wrong.


2 Multiple Imputation


Now, we’ll get into the real meat of the lab and move into the realm of recommended approaches. Specifically, we will treat the missing data via principled methods that correct the missing data problem through a theoretically motivated model for the missing values.

The first such approach we will consider is multiple imputation (MI). MI is a pre-processing method that treats the missing data as a separate step prior to estimating the analysis model. Consequently, the fist step in our journey toward an MI-based analysis will be multiply imputing the missing data.


2.1 Imputation


2.1.1

Multiply impute the missing data.

Use the mice::mice() function to impute the missing data using the following settings.

  • Create 20 imputations.
  • Use 25 iterations of the MICE algorithm.
  • Use predictive mean matching (PMM) as the elementary imputation method.
  • Set the pseudo random number seed to 235711.

HINT:

  • If you are unsure of how to implement these options check the documentation for the mice() function.
  • If you are really, really stuck, unhide the solution below.

View the parameterized mice() call.

## Generate 20 imputations of the missing data:
miceOut <- mice(ea, m = 20, maxit = 25, method = "pmm", seed = 235711)

2.1.2

Create a list of imputed datasets from the mids object generated in 2.1.1.

The mice() function returns a mids object that contains the imputations and a bunch of other metadata but not the imputed datasets.

  • We use the mice::complete() function to create the final imputed datasets.
  • Return the imputed data as a list (with length 20) of multiply imputed datasets.
## Replace the missing values with the imputations (and return a list of datasets):
eaImp <- complete(miceOut, "all")

2.2 Diagnostics


MI operates by filling the missing values with samples from the posterior predictive distribution (PPD) of the missing data. This PPD is essentially just a distribution of predictions generated from some Bayesian regression model. These models are usually estimated via Markov Chain Monte Carlo (MCMC) methods.

MCMC is a stochastic estimation procedure that estimates a probabilistic model by randomly sampling from the joint distribution of the model parameters (the so-called posterior distribution in Bayesian models) to build up an empirical representation of this distribution. When using MCMC, the estimated model (and, in our case, the imputations) is only valid if the MCMC algorithm has converged.

Therefore, before we can proceed with the analysis, we need to check two important criteria:

  1. The MCMC algorithm has converged onto a stationary solution (i.e., the MCMC samples represent valid draws from some stable distribution).
  2. The imputations are sensible (i.e., they look like reasonable replacements for the missing data).

For more information on MCMC estimation, convergence, and diagnostics for MI models, see Sections 4.5, 6.5, and 6.6 of Van Buuren (2019).


2.2.1

Create traceplots of the imputations’ means and variances.

One especially expedient way to evaluate convergence is by plotting the iteration history of the means and variances of the imputed values for each variable. Once the MCMC algorithm has converged, the individual lines in these plots (each of which corresponds to one imputation and represents an independent Markov Chain) should mix thoroughly, and the overall mass of the tracelines should show no trend.

  • Use the plot_trace() function from ggmice to create traceplots for each imputed variable.
  • Do these plots suggest convergence?
## Get the names of the imputed variables:
targets <- which(miceOut$method != "") %>% names()

## Create traceplots of the imputed means and variances:
for(v in targets)
    plot_trace(miceOut, v) %>% print()

The plots show well-mixed tracelines for each imputation, and there is no evidence of a global trend. So, these plots suggest convergence.


We can also evaluate convergence via numeric statistics such as the Potential Scale Reduction Factor, \(\hat{R}\), (Gelman & Rubin, 1992).


2.2.2

Compute the \(\hat{R}\) statistic for the means and variances of the imputations.

For mids objects, we can use the Rhat.mice() function from miceadds to compute the \(\hat{R}\) statistics for the means and variances of the imputed data.

  • At convergence, these statistic should all be close to 1.0 (i.e., \(\hat{R} < 1.1\))
  • Do the \(\hat{R}\) statistics suggest convergence?
library(miceadds)

## Compute the PSR factors for the means and variances of the imputed data:
Rhat.mice(miceOut)
   variable MissProp Rhat.M.imp Rhat.Var.imp
1        id     0.00         NA           NA
2      eat1     6.75  1.0054048    1.0012331
3      eat2     5.25  1.0073912    1.0177948
4     eat10     7.50  1.0111193    0.9995807
5     eat11     0.00         NA           NA
6     eat12     8.25  1.0096568    1.0105978
7     eat14     0.00         NA           NA
8     eat24     8.25  0.9953888    0.9972065
9      eat3     0.00         NA           NA
10    eat18     8.00  1.0024234    1.0000535
11    eat21     4.75  1.0013914    1.0022511
12      bmi     1.25  0.9993516    1.0126109
13      wsb     6.75  1.0043349    1.0017893
14      anx     4.50  0.9998069    0.9952062

Yes, all \(\hat{R}\) statistics are less than 1.1 (and very near 1.0), so these statistics suggest convergence.


After we confirm that the imputation model has converged, we still need to sanity check the imputed values, themselves. A convergent imputation model can still produce completely ridiculous and useless imputations. The fact that the MCMC algorithm has converged onto a stationary distribution does not mean that this distribution represents a sensible imputation model.

One of the best ways to check the plausibility of the imputations is with plots that juxtapose the imputed and observed data. Since we’re comparing distributions of numeric variables, kernel density plots are a good option.


2.2.3

Create kernel density plots of the imputed and observed data.

We can create “quick-and-dirty” density plots via the mice::densityplot() function, but ggmice::ggmice() allows us to create the same figures with all the control provided by ggplot(). For more details check this page.

  • Do the imputed values look reasonable?
for(v in targets) {
  p <- ggmice(miceOut, aes_string(v, group = ".imp", size = ".where")) + 
    geom_density() + 
    scale_size_manual(values = c("observed" = 1, "imputed" = 0.5),
                      guide  = "none") +
    theme(legend.position = "none")
  print(p)
}

Yes, the imputed values all seem plausible. The kernel densities for the imputed data all seem as though they could represent samples from a population similar to the one that generated the complete data.

  • Note that the populations that generate the observed and imputed data need not be the same unless the data are MCAR.

3 MI-Bases Analysis


Assuming our imputations pass the above checks, it’s time to move into the analysis phase of the process. In practice, we will lean heavily on routines from the semTools package to implement both the analysis and pooling phases.

  • To begin with, we’ll use semTools::lavaan.mi() to fit our models to the list of multiply imputed datasets from 2.1.2.

3.1 Basic CFA/SEM


3.1.1

Estimate the CFA defined in 1.3.1 on the multiply imputed data.

  • Correlate the latent factors.
  • Set the scale by standardizing the latent factors.
  • Estimate the mean structure.

HINT: The semTools::cfa.mi() function can be a big help here.

miCfa <- cfa.mi(cfaMod, data = eaImp, std.lv = TRUE, meanstructure = TRUE)

3.1.2

Summarize the fitted CFA and check the model fit.

  • Include the FMI statistics in your summary.
  • Use the D3 statistic to define the model fit indices.
  • Do the parameter estimates look sensible?
  • Does the model fit the data well enough?
  • Do any of the FMI estimates cause you to question the results?
summary(miCfa, fmi = TRUE) %>% print()

The RIV will exceed 1 whenever between-imputation variance exceeds within-imputation variance (when FMI(1) > 50%).

lavaan.mi object based on 20 imputed data sets. 
See class?lavaan.mi help page for available methods. 

Convergence information:
The model converged on 20 imputed data sets 

Rubin's (1987) rules were used to pool point and SE estimates across 20 imputed data sets, and to calculate degrees of freedom for each parameter's t test and CI.

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Latent Variables:
                   Estimate  Std.Err  t-value       df  P(>|t|)      FMI      RIV
  drive =~                                                                       
    eat1              0.731    0.051   14.419 5028.239    0.000    0.061    0.065
    eat2              0.652    0.047   13.982      Inf    0.000    0.018    0.018
    eat10             0.799    0.044   18.377 2831.970    0.000    0.082    0.089
    eat11             0.764    0.041   18.472      Inf    0.000    0.008    0.008
    eat12             0.662    0.047   14.032 3424.716    0.000    0.074    0.080
    eat14             0.901    0.043   20.951      Inf    0.000    0.005    0.005
    eat24             0.615    0.048   12.713 2035.951    0.000    0.097    0.107
  pre =~                                                                         
    eat3              0.774    0.048   16.223      Inf    0.000    0.038    0.040
    eat18             0.754    0.049   15.455 1637.941    0.000    0.108    0.121
    eat21             0.860    0.046   18.673 1924.002    0.000    0.099    0.110

Covariances:
                   Estimate  Std.Err  t-value       df  P(>|t|)      FMI      RIV
  drive ~~                                                                       
    pre               0.620    0.040   15.578 7586.696    0.000    0.050    0.053

Intercepts:
                   Estimate  Std.Err  t-value       df  P(>|t|)      FMI      RIV
   .eat1              4.012    0.056   71.848      Inf    0.000    0.043    0.045
   .eat2              3.939    0.051   77.279      Inf    0.000    0.019    0.020
   .eat10             3.948    0.051   76.769 8365.873    0.000    0.048    0.050
   .eat11             3.938    0.049   80.352      Inf    0.000    0.000    0.000
   .eat12             3.925    0.052   76.052 5793.515    0.000    0.057    0.061
   .eat14             3.962    0.053   74.316      Inf    0.000    0.000    0.000
   .eat24             3.986    0.052   76.893 4440.152    0.000    0.065    0.070
   .eat3              3.968    0.052   75.674      Inf    0.000    0.000    0.000
   .eat18             3.983    0.053   75.128 2448.087    0.000    0.088    0.097
   .eat21             3.949    0.052   75.410      Inf    0.000    0.025    0.026
    drive             0.000                                                      
    pre               0.000                                                      

Variances:
                   Estimate  Std.Err  t-value       df  P(>|t|)      FMI      RIV
   .eat1              0.613    0.049   12.466 2604.622    0.000    0.085    0.093
   .eat2              0.532    0.042   12.557 2021.696    0.000    0.097    0.107
   .eat10             0.334    0.030   11.119 3100.033    0.000    0.078    0.085
   .eat11             0.299    0.027   11.069      Inf    0.000    0.043    0.045
   .eat12             0.542    0.043   12.546 1229.291    0.000    0.124    0.142
   .eat14             0.234    0.026    9.142      Inf    0.000    0.042    0.044
   .eat24             0.610    0.048   12.781  636.995    0.000    0.173    0.209
   .eat3              0.412    0.042    9.853 1556.282    0.000    0.110    0.124
   .eat18             0.465    0.044   10.516  724.078    0.000    0.162    0.193
   .eat21             0.269    0.039    6.852  869.260    0.000    0.148    0.173
    drive             1.000                                                      
    pre               1.000                                                      
fitMeasures(miCfa, stat = "D3")
              chisq                  df              pvalue                ariv 
             46.156              34.000               0.080               0.189 
                fmi                npar              ntotal                logl 
              0.159              31.000             400.000           -4676.490 
  unrestricted.logl                 aic                 bic                bic2 
          -4649.044            9414.979            9538.715            9440.350 
     baseline.chisq         baseline.df     baseline.pvalue                 cfi 
           1741.955              45.000               0.000               0.993 
                rni                nnfi                 tli                 rfi 
              0.993               0.991               0.991               0.965 
                nfi                pnfi                 ifi                 mfi 
              0.974               0.736               0.993               0.985 
              rmsea      rmsea.ci.lower      rmsea.ci.upper        rmsea.pvalue 
              0.030               0.000               0.050               0.950 
           gammaHat         adjGammaHat                 rmr          rmr_nomean 
              0.994               0.990               0.030               0.033 
               srmr        srmr_bentler srmr_bentler_nomean                crmr 
              0.030               0.030               0.033               0.033 
        crmr_nomean          srmr_mplus   srmr_mplus_nomean 
              0.036               0.030               0.033 
  • This model looks good. All measurement model parameters seem reasonable, and the model fits the data very well.
  • None of the FMI values looks worryingly large.

Now, we will estimate a latent regression model wherein the preoccupation with food factor and the drive for thinness factor are correlated and regressed onto anxiety, Western beauty standards, and BMI.


3.1.3

Define the model syntax for the SEM described above.

  • Label the regression slopes for anxiety and Western beauty standards.
    • Give each slope a distinct label (you should end up with four unique labels).
  • Do not include the intercepts in the latent regression equations.
## Add the latent regression paths to the CFA syntax:
semMod <- paste(cfaMod, 
                'pre   ~ b1p * anx + b2p * wsb + bmi', 
                'drive ~ b1d * anx + b2d * wsb + bmi', 
                sep = '\n')

3.1.4

Estimate the SEM on the multiply imputed data.

  • Covary the latent factors.
  • Set the scale by standardizing the latent variables.

HINT: The semTools::sem.mi() function can be a big help here.

miSem1 <- sem.mi(semMod, data = eaImp, std.lv = TRUE)

3.1.5

Summarize the fitted SEM and interpret the results.

  • Include the FMI statistics in your summary.
  • Does anxiety significantly predict preoccupation with food and/or drive for thinness after controlling for Western beauty standards and BMI?
  • Does Western beauty standards significantly predict preoccupation with food and/or drive for thinness after controlling for anxiety and BMI?
summary(miSem1, fmi = TRUE) %>% print()

The RIV will exceed 1 whenever between-imputation variance exceeds within-imputation variance (when FMI(1) > 50%).

lavaan.mi object based on 20 imputed data sets. 
See class?lavaan.mi help page for available methods. 

Convergence information:
The model converged on 20 imputed data sets 

Rubin's (1987) rules were used to pool point and SE estimates across 20 imputed data sets, and to calculate degrees of freedom for each parameter's t test and CI.

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Latent Variables:
                   Estimate  Std.Err  t-value       df  P(>|t|)      FMI      RIV
  drive =~                                                                       
    eat1              0.558    0.039   14.390 5413.411    0.000    0.059    0.063
    eat2              0.496    0.036   13.924      Inf    0.000    0.038    0.039
    eat10             0.590    0.034   17.272 2261.225    0.000    0.092    0.101
    eat11             0.572    0.032   17.716 6730.312    0.000    0.053    0.056
    eat12             0.501    0.036   13.894 2425.844    0.000    0.089    0.097
    eat14             0.667    0.034   19.493 6530.932    0.000    0.054    0.057
    eat24             0.469    0.037   12.735 3366.363    0.000    0.075    0.081
  pre =~                                                                         
    eat3              0.629    0.040   15.697 2596.048    0.000    0.086    0.094
    eat18             0.614    0.041   15.068 2682.808    0.000    0.084    0.092
    eat21             0.684    0.040   17.197 1208.616    0.000    0.125    0.143

Regressions:
                   Estimate  Std.Err  t-value       df  P(>|t|)      FMI      RIV
  pre ~                                                                          
    anx      (b1p)    0.185    0.022    8.363 2533.868    0.000    0.087    0.095
    wsb      (b2p)    0.120    0.033    3.668  966.249    0.000    0.140    0.163
    bmi               0.094    0.023    4.112      Inf    0.000    0.029    0.030
  drive ~                                                                        
    anx      (b1d)    0.194    0.021    9.228 2365.604    0.000    0.090    0.098
    wsb      (b2d)    0.178    0.032    5.610 2333.176    0.000    0.090    0.099
    bmi               0.138    0.022    6.178      Inf    0.000    0.031    0.032

Covariances:
                   Estimate  Std.Err  t-value       df  P(>|t|)      FMI      RIV
 .drive ~~                                                                       
   .pre               0.381    0.057    6.665 4408.903    0.000    0.066    0.070

Variances:
                   Estimate  Std.Err  t-value       df  P(>|t|)      FMI      RIV
   .eat1              0.592    0.048   12.376 2757.270    0.000    0.083    0.091
   .eat2              0.517    0.041   12.484 1886.923    0.000    0.100    0.112
   .eat10             0.353    0.031   11.349 3248.520    0.000    0.076    0.083
   .eat11             0.299    0.027   11.106      Inf    0.000    0.043    0.045
   .eat12             0.532    0.043   12.490 1349.385    0.000    0.119    0.135
   .eat14             0.253    0.026    9.644      Inf    0.000    0.040    0.041
   .eat24             0.596    0.047   12.715  633.051    0.000    0.173    0.210
   .eat3              0.403    0.041    9.850 1498.585    0.000    0.113    0.127
   .eat18             0.453    0.043   10.447  778.068    0.000    0.156    0.185
   .eat21             0.288    0.038    7.564 1162.639    0.000    0.128    0.147
   .drive             1.000                                                      
   .pre               1.000                                                      
Table 3.1: Latent Regression Estimates
Estimate t p FMI
pre ~ anx 0.185 8.363 < 0.001 0.087
pre ~ wsb 0.120 3.668 < 0.001 0.140
drive ~ anx 0.194 9.228 < 0.001 0.090
drive ~ wsb 0.178 5.610 < 0.001 0.090
  • Yes, as shown in Table 3.1, anxiety is significantly and positively associated with both preoccupation with food and drive for thinness after controlling for Western beauty standards and BMI.
  • Likewise, Western beauty standards significantly and positively predicts preoccupation with food and drive for thinness after controlling for anxiety and BMI.

3.2 Testing


Given an SEM fitted to multiply imputed data via semTools::lavaan.mi(), we can use the testing utilities from semTools to evaluate (multivariate) hypotheses defined in terms of parameter constraints.

  • In the next two questions, we’ll test one such hypothesis.

3.2.1

Apply a multivariate Wald test to the fitted model from 3.1.4.

  • Test the bivariate null hypothesis that states:
    1. The effect of anxiety on both outcomes is equal
    2. The effect of Western beauty standards on both outcomes is equal
  • Use the \(D1\) statistic to conduct the test.
  • What is the conclusion of this test?

HINT: The semTools::lavTestWald.mi() function can be very helpful here.

## Define the constraints implied by the null hypothesis:
cons <- '
b1p == b1d
b2p == b2d
'

## Test the constraints via a multivariate Wald test:
lavTestWald.mi(miSem1, cons, test = "D1")
       F      df1      df2   pvalue     ariv      fmi 
   1.227    2.000 6923.960    0.293    0.071    0.067 

The Wald test returned a nonsignificant result ( \(F[2, 6923.96] = 1.23\), \(p = 0.293\), \(FMI = 0.067\) ). Hence, we cannot infer a difference in the effect of either predictor on either outcome.


3.2.2

Test the hypothesis from 3.2.1 using a likelihood ratio test (LRT).

  • Estimate a separate restricted SEM using the multiply imputed data.
  • Compare the full and restricted fits using a LRT.
  • Use the \(D3\) statistic to conduct the test.
  • What is the conclusion of this test?
## Define the restricted model by combining the constraint and SEM syntax:
semMod2 <- paste(semMod, cons, sep = '\n')

## Estimate the restricted model:
miSem2 <- sem.mi(semMod2, data = eaImp, std.lv = TRUE)

## Conduct the LRT:
anova(miSem1, miSem2, stat = "D3")
       F      df1      df2   pvalue     ariv      fmi 
   1.236    2.000 5773.641    0.291    0.079    0.073 

As above, the LRT returned a nonsignificant result ( \(F[2, 5773.64] = 1.24\), \(p = 0.291\), \(FMI = 0.073\) ). So, the restricted model does not fit significantly worse than the full model, and, consequently, we cannot infer a difference in the effect of either predictor on either outcome.


3.3 Mediation


Next, we’ll consider methods for modeling mediation when working with MI data. To keep things simple, we’re only going to estimate wherein Western beauty standards predict preoccupation with food, and drive for thinness mediates this relation.

  • You can see the path diagram for this model below.


3.3.1

Define the model syntax for the structural model.

  • Specify all paths needed to test the mediation hypothesis described above.
  • Include a defined parameter to quantify the indirect effect.
## Define only the new structural parts:
semMod3 <- '
pre   ~ b * drive + cp * wsb
drive ~ a * wsb

ab := a * b
'

## Add on the measurement part:
semMod3 <- paste(cfaMod, semMod3, sep = '\n')

Combining mediation with MI is a bit tricky. When treating the missing data with MI, we produce \(M\) different datasets, but we also need to create \(B\) bootstrap samples to test the indirect effects. The best method for combining these two layers of sampling is not immediately obvious, but the literature does provide some guidance. Wu and Jia (2013) recommend nesting bootstrapping within MI via an approach that they call MI(Boot). The MI(Boot) procedure entails the following steps.

  1. Impute the missing data \(M\) times. \(\\[6pt]\)
  2. For each of these \(M\) imputed datasets, estimate your mediation model as you would on fully observed data.
    • I.e., use \(B\) bootstrap samples to quantify the uncertainty. \(\\[6pt]\)
  3. Combine the \(M \times B\) resulting estimates of the indirect effect into a single sample with \(N = M \times B\) observations. \(\\[6pt]\)
  4. Treat this aggregated sample as a single pool of bootstrapped estimates and construct CIs in the usual way.

The MI(Boot) approach is not currently implemented in lavaan or semTools (neither is any alternative method of combining MI and bootstrapping). If you really wanted to, however, you could implement MI(Boot) manually.

  • The following three code chunks demonstrate one potential implementation.

First, we’ll define a function that will take a fitted lavaan object (lavObj) as input and return the bootstrapped estimates of all defined parameters specified in the model syntax of lavObj.

getDefParEstimates <- function(lavObj) {
  ## Use lavaan::lav_partable_constraints_def() to generate a function that 
  ## computes all defined parameters represented in 'lavObj':
  getDefPar <- lavObj %>% parTable() %>% lav_partable_constraints_def()
  
  ## Extract the bootstrapped coefficient estimates from 'lavObj':
  cf <- inspect(lavObj, "coef.boot")

  ## Compute the defined parameters on each bootstrapped sample:
  apply(cf, 1, getDefPar)
}

Now, we’ll use the semList() function from lavaan to fit semMod3 to each of the M = 20 imputed datasets in eaImp.

  • We’ll generate B = 1000 bootstrap samples for each of these 20 imputed datasets.
  • We’ll use the getDefParEstimates() function defined above to extract the relevant estimates (i.e., the bootstrapped estimates of the indirect effect) from each one of our 20 fitted lavaan objects.

NOTE: This code will take quite a long time to run. So, if you want to run the code yourself, consider first doing so with a small number of bootstrap samples (e.g., \(B = 50\)) to get a sense of the run time to expect.

  • If you want a more accurate estimate, you can time this baby run with system.time() and extrapolate from there.
## Fit the models:
fits <- semList(semMod3, 
                dataList  = eaImp, 
                std.lv    = TRUE, 
                se        = "boot", 
                bootstrap = 1000, 
                FUN       = getDefParEstimates)

Finally, we aggregate the bootstrapped estimates of the indirect effect from above (these live in the funList slot of the fits object) into a single sample with \(N = 20 \times 1000 = 20000\) observations.

  • This aggregated pool of estimates represents the empirical sampling distribution of our indirect effect, so we can summarize this aggregated sample for inference (e.g., by compute the BCa CI of the indirect effect).
## Aggregate estimates:
ab <- fits@funList %>% unlist()

## Use the bca() function from the coxed package to compute the BCa CI for the 
## indirect effect:
coxed::bca(ab)
[1] 0.1054825 0.2158876

The MI(Boot) approach will produce theoretically optimal results, but it does so at a high computational cost. Implementing the approach for lavaan models is also a bit tricky. Thankfully, we have a very satisfactory alternative: the Monte Carlo method proposed by MacKinnon et al. (2004).

The justification for the Monte Carlo method begins with the original motivation for using bootstrapping to test the indirect effect: the \(a\) and \(b\) paths are normally distributed, so their product cannot be. Rather than going fully non-parametric (as in bootstrapping), though, the Monte Carlo method leverages the known sampling distributions of \(a\) and \(b\) to synthesize an empirical sampling distribution of their product (i.e., the indirect effect). The Monte Carlo method entails the following procedure.

  1. Estimate the mediation model on the full dataset.
    • For MI data, this model is estimated in the usual way, and the parameter estimates used in the following steps are the pooled MI estimates. \(\\[6pt]\)
  2. Use the estimated \(a\) and \(b\) paths along with their asymptotic covariance matrix to define a bivariate normal distribution.
    • This distribution represents an estimate of the joint sampling distribution of \(a\) and \(b\) under the assumption of bivariate normality (which is usually a reasonable assumption). \(\\[6pt]\)
  3. Generate a very large sample (e.g., \(N \geq 20000\)) from the bivariate normal distribution defined above.
    • Although we need to generate a very large sample here, generating such a sample is computationally trivial and very fast. \(\\[6pt]\)
  4. For each \(\{a_n, b_n\}\) pair in the above sample, compute the indirect effect, \(ab_n = a_n \times b_n\). \(\\[6pt]\)
  5. Use the \(N\) estimates of the indirect effect to define an empirical sampling distribution, and summarize this distribution as you would if the sample had been generated by ordinary bootstrapping.

Fortunately for us, the Monte Carlo simulation approach is implemented in the monteCarloCI() function from semTools. For more information on this approach, see MacKinnon et al. (2004) and Preacher and Selig (2012).


3.3.2

Use the Monte Carlo simulation approach to test the above mediation hypothesis.

  • Is the indirect effect statistically significant?
## Estimate the mediation model on the MI data:
miSem3 <- sem.mi(semMod3, data = eaImp, std.lv = TRUE)

## Compute Monte Carlo (percentile) CIs:
monteCarloCI(miSem3, plot = TRUE)

Yes, the indirect effect is statistically significant. It is also interesting to note that the Monte Carlo CI is very similar to the MI(Boot) CI estimated above.


4 FIML


Now that we’ve had our fun with MI, lets explore the final topic of this lab: full information maximum likelihood (FIML).


4.1 Basic FIML


4.1.1

Estimate the CFA defined in 1.3.1 using FIML.

  • Fit the model to the incomplete EA data.
  • Correlate the latent variables.
  • Estimate the mean structure.
  • Set the scale by standardizing the latent variables.
fimlCfa1 <- cfa(cfaMod, data = ea, std.lv = TRUE, missing = "fiml")

To implement basic FIML estimation, we need only add the missing = "fiml" option to our cfa() call.


4.1.2

Summarize the fitted model.

  • Include the FMI statistics in your summary
    • Do any of these FMI values give you pause?
    • How do these FMI estimates compare to those from the MI-based CFA in 3.1.1? \(\\[6pt]\)
  • Does the number of missing data patterns shown in the summary match the number you calculated in 1.2.5?
    • If not, what do you think causes this discrepancy?
  • Do the parameter estimates looks sensible?
  • Does the model fit the data well enough?
summary(fimlCfa1, fmi = TRUE)
lavaan 0.6-12 ended normally after 47 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        31

  Number of observations                           400
  Number of missing patterns                        31

Model Test User Model:
                                                      
  Test statistic                                47.104
  Degrees of freedom                                34
  P-value (Chi-square)                           0.067

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Observed
  Observed information based on                Hessian

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)      FMI
  drive =~                                                     
    eat1              0.741    0.050   14.763    0.000    0.068
    eat2              0.650    0.045   14.383    0.000    0.022
    eat10             0.807    0.043   18.926    0.000    0.042
    eat11             0.764    0.040   19.207    0.000    0.006
    eat12             0.662    0.047   14.056    0.000    0.085
    eat14             0.901    0.041   21.788    0.000    0.005
    eat24             0.623    0.048   12.876    0.000    0.093
  pre =~                                                       
    eat3              0.772    0.046   16.675    0.000    0.020
    eat18             0.749    0.048   15.498    0.000    0.084
    eat21             0.862    0.045   18.956    0.000    0.062

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)      FMI
  drive ~~                                                     
    pre               0.622    0.038   16.195    0.000    0.024

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)      FMI
   .eat1              4.002    0.055   73.099    0.000    0.040
   .eat2              3.934    0.050   79.135    0.000    0.032
   .eat10             3.955    0.050   78.611    0.000    0.032
   .eat11             3.938    0.047   83.777    0.000   -0.000
   .eat12             3.926    0.051   77.399    0.000    0.051
   .eat14             3.963    0.051   77.484    0.000   -0.000
   .eat24             3.980    0.051   77.922    0.000    0.056
   .eat3              3.968    0.050   78.900    0.000   -0.000
   .eat18             3.974    0.052   77.082    0.000    0.046
   .eat21             3.950    0.051   77.923    0.000    0.022
    drive             0.000                                    
    pre               0.000                                    

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)      FMI
   .eat1              0.602    0.048   12.434    0.000    0.075
   .eat2              0.534    0.042   12.707    0.000    0.059
   .eat10             0.329    0.030   11.036    0.000    0.087
   .eat11             0.300    0.026   11.426    0.000    0.023
   .eat12             0.538    0.043   12.457    0.000    0.086
   .eat14             0.235    0.025    9.362    0.000    0.062
   .eat24             0.597    0.047   12.706    0.000    0.087
   .eat3              0.416    0.041   10.020    0.000    0.058
   .eat18             0.453    0.044   10.346    0.000    0.101
   .eat21             0.262    0.039    6.663    0.000    0.088
    drive             1.000                                    
    pre               1.000                                    
fitMeasures(fimlCfa1)
               npar                fmin               chisq                  df 
             31.000               0.059              47.104              34.000 
             pvalue      baseline.chisq         baseline.df     baseline.pvalue 
              0.067            1949.598              45.000               0.000 
                cfi                 tli                nnfi                 rfi 
              0.993               0.991               0.991               0.968 
                nfi                pnfi                 ifi                 rni 
              0.976               0.737               0.993               0.993 
               logl   unrestricted.logl                 aic                 bic 
          -4441.944           -4418.392            8945.888            9069.624 
             ntotal                bic2               rmsea      rmsea.ci.lower 
            400.000            8971.259               0.031               0.000 
     rmsea.ci.upper        rmsea.pvalue                 rmr          rmr_nomean 
              0.051               0.940               0.029               0.032 
               srmr        srmr_bentler srmr_bentler_nomean                crmr 
              0.029               0.029               0.032               0.032 
        crmr_nomean          srmr_mplus   srmr_mplus_nomean               cn_05 
              0.035               0.029               0.032             413.723 
              cn_01                 gfi                agfi                pgfi 
            477.059               0.997               0.994               0.521 
                mfi                ecvi 
              0.984               0.273 
  • None of the FMI values are especially large, and they tend to be a little smaller than the FMI values produced by MI (as shown below).

  • The number of response patterns listed in the summary, 31, is fewer than the number of patterns calculated from the full dataset, 46 .

    • This discrepancy arises because we are only analyzing a subset of the incomplete variables here. The total number of response patterns increases when considering the full dataset. \(\\[12pt]\)
  • Generally, the model looks good. The parameter estimates all seem sensible, and the model fits the data very well.


4.2 Auxiliary Variables


Although the model above seems to have produced satisfactory estimates, the MAR assumption will have been violated unless the missingness on the indicators of drive and pre is associated only with other such indicators. If, for example, missingness on some of the pre items is associated with BMI, then the data are MNAR, and the model estimates are probably biased.

  • We can address this issue by including auxiliary variables (i.e., potential correlates of missingness in which we have no substantive interest) via the Graham (2003) saturated correlates technique.

4.2.1

Rerun the CFA using bmi, anx, and wsb as auxiliary variables.

  • Include the auxiliaries via the saturated correlates approach.
  • Keep all other model parameterization and estimation settings the same as above.

HINT: The semTools::cfa.auxiliary() function can be very helpful here.

fimlCfa2 <- cfa.auxiliary(cfaMod, 
                          data   = ea, 
                          aux    = c("bmi", "anx", "wsb"), 
                          std.lv = TRUE)

NOTE: When you run this model, you will probably receive a warning message about a not positive definite residual covariance matrix. As you can see from the eigenvalues below, it is true that the residual covariance matrix is NPD.

## Extract the residual covariance matrix:
theta <- lavInspect(fimlCfa2, "theta")

## Indeed, we have a negative eigenvalue => NPD
eigen(theta)$values
 [1] 12.2667170  6.8382415  3.3098964  0.6012041  0.5727693  0.5351208
 [7]  0.4822731  0.4311151  0.3585560  0.3149530  0.2801749  0.2426013
[13] -1.7461727

However, I honestly cannot figure out what’s wrong. As you can see below, the residual covariance matrix itself looks fine (i.e., no Heywood cases, no extreme values, all covariances are legal according to the relevant triangle inequalities).

print(theta)
      eat1  eat2  eat10 eat11 eat12 eat14 eat24 eat3  eat18 eat21 bmi   anx   wsb  
eat1  0.604                                                                        
eat2  0.000 0.535                                                                  
eat10 0.000 0.000 0.328                                                            
eat11 0.000 0.000 0.000 0.300                                                      
eat12 0.000 0.000 0.000 0.000 0.538                                                
eat14 0.000 0.000 0.000 0.000 0.000 0.234                                          
eat24 0.000 0.000 0.000 0.000 0.000 0.000 0.599                                    
eat3  0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.415                              
eat18 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.451                        
eat21 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.264                  
bmi   0.940 0.945 0.730 0.740 0.764 0.877 0.887 0.705 0.700 0.773 7.374            
anx   1.435 1.211 1.046 1.191 1.194 1.268 1.256 1.379 1.351 1.279 1.244 9.200      
wsb   0.605 0.485 0.503 0.569 0.486 0.652 0.541 0.403 0.523 0.541 1.118 1.049 3.646

If you want to check my sleuthing w.r.t. this issue, you can find the relevant code here. In the end, I’m willing to tentatively trust these estimates.


4.2.2

Summarize the CFA model and check the model fit.

  • How do the results compare to the model estimated without auxiliary variables?
summary(fimlCfa2, fmi = TRUE)
lavaan 0.6-12 ended normally after 102 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        70

  Number of observations                           400
  Number of missing patterns                        46

Model Test User Model:
                                                      
  Test statistic                                49.044
  Degrees of freedom                                34
  P-value (Chi-square)                           0.046

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Observed
  Observed information based on                Hessian

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)      FMI
  drive =~                                                     
    eat1              0.741    0.050   14.772    0.000    0.063
    eat2              0.649    0.045   14.369    0.000    0.017
    eat10             0.808    0.043   18.956    0.000    0.042
    eat11             0.764    0.040   19.212    0.000    0.003
    eat12             0.662    0.047   14.055    0.000    0.076
    eat14             0.901    0.041   21.811    0.000    0.007
    eat24             0.622    0.048   12.875    0.000    0.097
  pre =~                                                       
    eat3              0.772    0.046   16.685    0.000    0.027
    eat18             0.751    0.048   15.558    0.000    0.091
    eat21             0.862    0.046   18.917    0.000    0.064

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)      FMI
  drive ~~                                                     
    pre               0.620    0.039   16.078    0.000    0.027
  bmi ~~                                                       
    anx               1.244    0.426    2.917    0.004    0.042
    wsb               1.118    0.275    4.065    0.000    0.065
  anx ~~                                                       
    wsb               1.049    0.311    3.370    0.001    0.094
  bmi ~~                                                       
   .eat1              0.940    0.157    5.987    0.000    0.056
  anx ~~                                                       
   .eat1              1.435    0.181    7.949    0.000    0.072
  wsb ~~                                                       
   .eat1              0.605    0.111    5.431    0.000    0.079
  bmi ~~                                                       
   .eat2              0.945    0.143    6.617    0.000    0.033
  anx ~~                                                       
   .eat2              1.211    0.160    7.592    0.000    0.045
  wsb ~~                                                       
   .eat2              0.485    0.101    4.783    0.000    0.098
  bmi ~~                                                       
   .eat10             0.730    0.142    5.150    0.000    0.038
  anx ~~                                                       
   .eat10             1.046    0.160    6.534    0.000    0.038
  wsb ~~                                                       
   .eat10             0.503    0.103    4.885    0.000    0.097
  bmi ~~                                                       
   .eat11             0.740    0.133    5.574    0.000    0.007
  anx ~~                                                       
   .eat11             1.191    0.154    7.733    0.000    0.028
  wsb ~~                                                       
   .eat11             0.569    0.096    5.953    0.000    0.036
  bmi ~~                                                       
   .eat12             0.764    0.143    5.334    0.000    0.059
  anx ~~                                                       
   .eat12             1.194    0.165    7.250    0.000    0.086
  wsb ~~                                                       
   .eat12             0.486    0.104    4.658    0.000    0.129
  bmi ~~                                                       
   .eat14             0.877    0.146    6.025    0.000    0.005
  anx ~~                                                       
   .eat14             1.268    0.169    7.485    0.000    0.043
  wsb ~~                                                       
   .eat14             0.652    0.106    6.135    0.000    0.065
  bmi ~~                                                       
   .eat24             0.887    0.149    5.967    0.000    0.106
  anx ~~                                                       
   .eat24             1.256    0.166    7.564    0.000    0.098
  wsb ~~                                                       
   .eat24             0.541    0.104    5.213    0.000    0.106
  bmi ~~                                                       
   .eat3              0.705    0.141    5.007    0.000    0.015
  anx ~~                                                       
   .eat3              1.379    0.169    8.179    0.000    0.054
  wsb ~~                                                       
   .eat3              0.403    0.100    4.006    0.000    0.037
  bmi ~~                                                       
   .eat18             0.700    0.145    4.843    0.000    0.074
  anx ~~                                                       
   .eat18             1.351    0.171    7.910    0.000    0.092
  wsb ~~                                                       
   .eat18             0.523    0.105    4.991    0.000    0.100
  bmi ~~                                                       
   .eat21             0.773    0.143    5.398    0.000    0.040
  anx ~~                                                       
   .eat21             1.279    0.169    7.579    0.000    0.078
  wsb ~~                                                       
   .eat21             0.541    0.104    5.183    0.000    0.085

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)      FMI
   .eat1              4.004    0.055   73.123    0.000    0.038
   .eat2              3.937    0.050   79.306    0.000    0.030
   .eat10             3.953    0.050   78.551    0.000    0.032
   .eat11             3.937    0.047   83.777    0.000   -0.000
   .eat12             3.929    0.051   77.411    0.000    0.052
   .eat14             3.962    0.051   77.484    0.000   -0.000
   .eat24             3.986    0.051   78.125    0.000    0.053
   .eat3              3.968    0.050   78.900    0.000   -0.000
   .eat18             3.982    0.052   77.193    0.000    0.046
   .eat21             3.952    0.051   77.906    0.000    0.022
    drive             0.000                                    
    pre               0.000                                    
    bmi              22.406    0.136  164.171    0.000    0.010
    anx              11.971    0.154   77.834    0.000    0.028
    wsb               8.959    0.098   90.996    0.000    0.060

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)      FMI
   .eat1              0.604    0.049   12.436    0.000    0.076
   .eat2              0.535    0.042   12.716    0.000    0.056
   .eat10             0.328    0.030   11.037    0.000    0.085
   .eat11             0.300    0.026   11.428    0.000    0.022
   .eat12             0.538    0.043   12.430    0.000    0.088
   .eat14             0.234    0.025    9.347    0.000    0.060
   .eat24             0.599    0.047   12.663    0.000    0.105
   .eat3              0.415    0.041   10.009    0.000    0.058
   .eat18             0.451    0.044   10.305    0.000    0.106
   .eat21             0.264    0.039    6.712    0.000    0.088
    drive             1.000                                    
    pre               1.000                                    
    bmi               7.374    0.526   14.012    0.000    0.013
    anx               9.200    0.675   13.622    0.000    0.062
    wsb               3.646    0.268   13.615    0.000    0.067
fitMeasures(fimlCfa2)
               npar                fmin               chisq                  df 
             70.000               0.061              49.044              34.000 
             pvalue      baseline.chisq         baseline.df     baseline.pvalue 
              0.046            1956.345              45.000               0.000 
                cfi                 tli                nnfi                 rfi 
              0.992               0.990               0.990               0.967 
                nfi                pnfi                 ifi                 rni 
              0.975               0.737               0.992               0.992 
               logl   unrestricted.logl                 aic                 bic 
          -6957.674           -6933.152           14055.349           14334.751 
             ntotal                bic2               rmsea      rmsea.ci.lower 
            400.000           14112.637               0.033               0.005 
     rmsea.ci.upper        rmsea.pvalue                 rmr          rmr_nomean 
              0.053               0.918               0.038               0.040 
               srmr        srmr_bentler srmr_bentler_nomean                crmr 
              0.025               0.025               0.027               0.027 
        crmr_nomean          srmr_mplus   srmr_mplus_nomean               cn_05 
              0.029               0.026               0.027             397.401 
              cn_01                 gfi                agfi                pgfi 
            458.232               0.999               0.996               0.326 
                mfi                ecvi 
              0.981               0.473 

The results are all very similar, and this model looks equally good. The similarity of the results should not be too surprising when considering the relatively small amount of missing data. One interesting difference is that the number of missing data patterns given in the output now matches the number we computed for the entire dataset. Of course, this change is also natural since we are now using every variable from the EA data in the model.


4.3 Fixed vs. Random \(X\)


4.3.1

Use FIML to estimate the latent regression model defined in 3.1.3.

  • Correlate the latent factors.
  • Set the scale by standardizing the latent variables.
  • Did anything unusual/unexpected happen during model estimation?
    • If you do see some unusual behavior, can you explain what happened?
fimlSem1 <- sem(semMod, data = ea, std.lv = TRUE, missing = "fiml")
Warning in lav_data_full(data = data, group = group, cluster = cluster, : lavaan WARNING: 50 cases were deleted due to missing values in 
          exogenous variable(s), while fixed.x = TRUE.

Some cases have been deleted due to missing values.

We’re treating the observed predictors as fixed (as is customary in OLS regression models, and as we did in the MI-based analysis). This choice caused no issues when analyzing the MI data because those predictors were completed in every imputed dataset. Now, however, the observed predictors enter the analysis in their incomplete state, and FIML cannot treat their missing data because fixed variables are not represented in the likelihood function.

Consequently, any observations with missing data on one of the observed predictor variables are deleted before estimating the model.


4.3.2

Summarize the fitted SEM.

  • Include the FMI estimates in the summary.
  • How do the latent regression results compare to those from the MI-based Model?
summary(fimlSem1, fmi = TRUE)
lavaan 0.6-12 ended normally after 99 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        37

                                                  Used       Total
  Number of observations                           350         400
  Number of missing patterns                        28            

Model Test User Model:
                                                      
  Test statistic                               135.890
  Degrees of freedom                                58
  P-value (Chi-square)                           0.000

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Observed
  Observed information based on                Hessian

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)      FMI
  drive =~                                                     
    eat1              0.544    0.040   13.523    0.000    0.048
    eat2              0.497    0.037   13.579    0.000    0.029
    eat10             0.615    0.036   16.933    0.000   -0.001
    eat11             0.592    0.034   17.401    0.000    0.001
    eat12             0.492    0.037   13.269    0.000    0.064
    eat14             0.666    0.035   18.823    0.000    0.000
    eat24             0.479    0.039   12.171    0.000    0.117
  pre =~                                                       
    eat3              0.629    0.042   15.111    0.000    0.023
    eat18             0.620    0.042   14.608    0.000    0.104
    eat21             0.696    0.042   16.470    0.000    0.029

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)      FMI
  pre ~                                                        
    anx      (b1p)    0.177    0.023    7.778    0.000    0.006
    wsb      (b2p)    0.113    0.034    3.327    0.001    0.013
    bmi               0.099    0.024    4.194    0.000    0.011
  drive ~                                                      
    anx      (b1d)    0.191    0.022    8.800    0.000    0.003
    wsb      (b2d)    0.179    0.033    5.443    0.000    0.003
    bmi               0.129    0.023    5.652    0.000   -0.000

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)      FMI
 .drive ~~                                                     
   .pre               0.389    0.058    6.679    0.000    0.029

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)      FMI
   .eat1              0.289    0.369    0.784    0.433    0.036
   .eat2              0.560    0.335    1.671    0.095   -0.012
   .eat10            -0.207    0.358   -0.577    0.564    0.046
   .eat11            -0.054    0.345   -0.155    0.877   -0.004
   .eat12             0.565    0.334    1.690    0.091    0.024
   .eat14            -0.555    0.371   -1.494    0.135   -0.003
   .eat24             0.716    0.341    2.101    0.036    0.011
   .eat3              0.581    0.380    1.529    0.126    0.004
   .eat18             0.648    0.385    1.686    0.092   -0.006
   .eat21             0.233    0.400    0.582    0.561    0.049
   .drive             0.000                                    
   .pre               0.000                                    

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)      FMI
   .eat1              0.590    0.050   11.866    0.000    0.055
   .eat2              0.508    0.043   11.855    0.000    0.067
   .eat10             0.324    0.031   10.358    0.000    0.074
   .eat11             0.301    0.028   10.723    0.000    0.015
   .eat12             0.496    0.042   11.718    0.000    0.074
   .eat14             0.245    0.026    9.333    0.000    0.031
   .eat24             0.566    0.048   11.793    0.000    0.116
   .eat3              0.415    0.043    9.721    0.000    0.055
   .eat18             0.422    0.044    9.555    0.000    0.101
   .eat21             0.258    0.039    6.697    0.000    0.071
   .drive             1.000                                    
   .pre               1.000                                    
## Latent regression estimates for FIML:
partSummary(fimlSem1, 8, fmi = TRUE)
Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)      FMI
  pre ~                                                        
    anx      (b1p)    0.177    0.023    7.778    0.000    0.006
    wsb      (b2p)    0.113    0.034    3.327    0.001    0.013
    bmi               0.099    0.024    4.194    0.000    0.011
  drive ~                                                      
    anx      (b1d)    0.191    0.022    8.800    0.000    0.003
    wsb      (b2d)    0.179    0.033    5.443    0.000    0.003
    bmi               0.129    0.023    5.652    0.000   -0.000
## Latent regression estimates for MI:
partSummary(miSem1, 2, drops = "riv", fmi = TRUE)
Regressions:
                   Estimate  Std.Err  t-value       df  P(>|t|)      FMI
  pre ~                                                                 
    anx      (b1p)    0.185    0.022    8.363 2533.868    0.000    0.087
    wsb      (b2p)    0.120    0.033    3.668  966.249    0.000    0.140
    bmi               0.094    0.023    4.112      Inf    0.000    0.029
  drive ~                                                               
    anx      (b1d)    0.194    0.021    9.228 2365.604    0.000    0.090
    wsb      (b2d)    0.178    0.032    5.610 2333.176    0.000    0.090
    bmi               0.138    0.022    6.178      Inf    0.000    0.031

The parameter estimates are very similar to those produced by MI, and the inferential conclusions are all the same. The FMI estimates for the FIML-based regression slopes are quite a bit smaller than their MI-based counterparts, though.


4.3.3

Rerun the model from 4.3.1 with random predictors.

  • Model the observed predictor variables as random by setting fixed.x = FALSE in the sem() call.
  • Keep all other settings the same as above.
fimlSem2 <- sem(semMod, 
                data    = ea, 
                std.lv  = TRUE, 
                missing = "fiml", 
                fixed.x = FALSE)

4.3.4

Summarize the fitted SEM from 4.3.3.

  • What differences do you notice between the fixed-\(X\) and random-\(X\) fits?
  • Which of these two models do you think is more appropriate and why?
summary(fimlSem2, fmi = TRUE)
lavaan 0.6-12 ended normally after 114 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        46

  Number of observations                           400
  Number of missing patterns                        46

Model Test User Model:
                                                      
  Test statistic                               127.166
  Degrees of freedom                                58
  P-value (Chi-square)                           0.000

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Observed
  Observed information based on                Hessian

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)      FMI
  drive =~                                                     
    eat1              0.568    0.038   14.876    0.000    0.076
    eat2              0.497    0.035   14.392    0.000    0.035
    eat10             0.599    0.034   17.410    0.000    0.028
    eat11             0.575    0.032   18.184    0.000    0.006
    eat12             0.504    0.036   13.974    0.000    0.093
    eat14             0.671    0.034   19.768    0.000    0.003
    eat24             0.476    0.037   12.933    0.000    0.105
  pre =~                                                       
    eat3              0.631    0.039   16.142    0.000    0.030
    eat18             0.614    0.040   15.210    0.000    0.108
    eat21             0.690    0.040   17.120    0.000    0.040

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)      FMI
  pre ~                                                        
    anx      (b1p)    0.183    0.022    8.285    0.000    0.066
    wsb      (b2p)    0.114    0.033    3.496    0.000    0.078
    bmi               0.097    0.022    4.320    0.000    0.025
  drive ~                                                      
    anx      (b1d)    0.195    0.021    9.359    0.000    0.045
    wsb      (b2d)    0.172    0.032    5.426    0.000    0.085
    bmi               0.138    0.022    6.318    0.000    0.023

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)      FMI
 .drive ~~                                                     
   .pre               0.392    0.055    7.071    0.000    0.043
  anx ~~                                                       
    wsb               1.004    0.305    3.295    0.001    0.102
    bmi               1.116    0.419    2.665    0.008    0.050
  wsb ~~                                                       
    bmi               1.110    0.273    4.065    0.000    0.066

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)      FMI
   .eat1              0.052    0.362    0.143    0.886    0.035
   .eat2              0.481    0.323    1.489    0.136    0.013
   .eat10            -0.207    0.339   -0.611    0.541    0.059
   .eat11            -0.060    0.322   -0.186    0.852    0.030
   .eat12             0.424    0.329    1.287    0.198    0.039
   .eat14            -0.700    0.358   -1.956    0.051    0.034
   .eat24             0.672    0.327    2.055    0.040    0.031
   .eat3              0.579    0.363    1.596    0.110    0.028
   .eat18             0.681    0.363    1.876    0.061    0.004
   .eat21             0.245    0.383    0.639    0.523    0.070
    anx              11.960    0.152   78.529    0.000    0.032
    wsb               8.959    0.098   91.276    0.000    0.059
    bmi              22.400    0.136  164.413    0.000    0.011
   .drive             0.000                                    
   .pre               0.000                                    

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)      FMI
   .eat1              0.582    0.047   12.357    0.000    0.079
   .eat2              0.521    0.041   12.662    0.000    0.058
   .eat10             0.346    0.031   11.268    0.000    0.088
   .eat11             0.300    0.026   11.512    0.000    0.023
   .eat12             0.528    0.042   12.442    0.000    0.085
   .eat14             0.252    0.026    9.856    0.000    0.050
   .eat24             0.584    0.046   12.673    0.000    0.101
   .eat3              0.406    0.040   10.031    0.000    0.060
   .eat18             0.442    0.043   10.325    0.000    0.098
   .eat21             0.282    0.038    7.381    0.000    0.097
   .drive             1.000                                    
   .pre               1.000                                    
    anx               8.983    0.652   13.768    0.000    0.063
    wsb               3.627    0.265   13.682    0.000    0.067
    bmi               7.346    0.522   14.069    0.000    0.011

Naturally, we can observe a few differences when modeling the predictors as random variables.

  1. We use every observation because none need to be deleted due to unmodeled missing data.
  2. We estimate means and variances for the observed predictors in the random-\(X\) model and not in the fixed-\(X\) model.
  3. The estimates are somewhat different between the two models, most importantly the latent regression estimates.

Apart from the obviously problematic deletion of the incomplete cases when \(X\) is treated as fixed, we cannot really say that one approach is better, or more correct, than the other. Treating the predictors as fixed versus random is not simply an estimation choice (as, for example, selecting a different optimization algorithm or adjusting the convergence criteria would be). When we model \(X\) as random, we are specifying a fundamentally different model than we are when \(X\) is treated as fixed. In the former case, our latent regression model summarizes the joint distribution of \(Y\) and \(X\). In the latter case, \(X\) has no distribution and simply serves to define the mean of \(Y\). Neither of these is the “correct” way to represent the data; they’re just different. As researchers, we need to be aware of the model we’re estimating, and ensure that said model aligns with our inferential objectives.


4.4 Mediation


4.4.1

Use FIML to estimate the mediation model defined in 3.3.1.

  • Include bmi and anx as auxiliary variables.
  • Use B = 2000 bootstrap resamples to quantify the sampling variability.

NOTE: This code will take a long time to run. You may want to consider using the run timing strategies discussed in the MI(Boot) section.

## Estimate the mediation model using FIML with auxiliaries and bootstrapping:
fimlSem3 <- sem.auxiliary(semMod3, 
                          data      = ea, 
                          aux       = c("bmi", "anx"),
                          std.lv    = TRUE, 
                          se        = "boot", 
                          bootstrap = 2000)

4.4.2

Summarize the fitted mediation model and interpret the results.

  • Is the indirect effect through Drive for Thinness significant according to the 95% BC CI?
  • Briefly interpret the indirect effects.
summary(fimlSem3, fmi = TRUE)
lavaan 0.6-12 ended normally after 111 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        62

  Number of observations                           400
  Number of missing patterns                        46

Model Test User Model:
                                                      
  Test statistic                                55.854
  Degrees of freedom                                42
  P-value (Chi-square)                           0.075

Parameter Estimates:

  Standard errors                            Bootstrap
  Number of requested bootstrap draws             2000
  Number of successful bootstrap draws            2000

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)
  drive =~                                            
    eat1              0.685    0.046   14.820    0.000
    eat2              0.600    0.044   13.641    0.000
    eat10             0.744    0.049   15.090    0.000
    eat11             0.706    0.044   15.910    0.000
    eat12             0.612    0.049   12.382    0.000
    eat14             0.832    0.044   18.880    0.000
    eat24             0.577    0.046   12.432    0.000
  pre =~                                              
    eat3              0.602    0.044   13.709    0.000
    eat18             0.588    0.045   13.035    0.000
    eat21             0.675    0.044   15.372    0.000

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)
  pre ~                                               
    drive      (b)    0.699    0.086    8.102    0.000
    wsb       (cp)    0.049    0.034    1.427    0.154
  drive ~                                             
    wsb        (a)    0.219    0.032    6.760    0.000

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)
  bmi ~~                                              
    anx               1.239    0.407    3.048    0.002
   .eat1              0.780    0.139    5.602    0.000
  anx ~~                                              
   .eat1              1.302    0.175    7.437    0.000
  bmi ~~                                              
   .eat2              0.806    0.135    5.979    0.000
  anx ~~                                              
   .eat2              1.091    0.164    6.659    0.000
  bmi ~~                                              
   .eat10             0.565    0.132    4.277    0.000
  anx ~~                                              
   .eat10             0.884    0.167    5.311    0.000
  bmi ~~                                              
   .eat11             0.579    0.124    4.686    0.000
  anx ~~                                              
   .eat11             1.052    0.150    7.015    0.000
  bmi ~~                                              
   .eat12             0.624    0.120    5.200    0.000
  anx ~~                                              
   .eat12             1.073    0.180    5.974    0.000
  bmi ~~                                              
   .eat14             0.688    0.130    5.300    0.000
  anx ~~                                              
   .eat14             1.100    0.172    6.402    0.000
  bmi ~~                                              
   .eat24             0.746    0.130    5.731    0.000
  anx ~~                                              
   .eat24             1.148    0.157    7.305    0.000
  bmi ~~                                              
   .eat3              0.579    0.147    3.935    0.000
  anx ~~                                              
   .eat3              1.261    0.163    7.736    0.000
  bmi ~~                                              
   .eat18             0.564    0.146    3.853    0.000
  anx ~~                                              
   .eat18             1.247    0.164    7.587    0.000
  bmi ~~                                              
   .eat21             0.626    0.138    4.535    0.000
  anx ~~                                              
   .eat21             1.157    0.156    7.407    0.000
  bmi ~~                                              
    wsb               1.052    0.276    3.809    0.000
  anx ~~                                              
    wsb               0.927    0.312    2.974    0.003

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)
   .eat1              2.660    0.204   13.055    0.000
   .eat2              2.762    0.183   15.124    0.000
   .eat10             2.494    0.214   11.655    0.000
   .eat11             2.554    0.209   12.213    0.000
   .eat12             2.731    0.190   14.371    0.000
   .eat14             2.332    0.234    9.987    0.000
   .eat24             2.854    0.192   14.837    0.000
   .eat3              2.879    0.191   15.084    0.000
   .eat18             2.919    0.201   14.555    0.000
   .eat21             2.733    0.217   12.603    0.000
    wsb               8.958    0.097   92.391    0.000
   .drive             0.000                           
   .pre               0.000                           
    bmi              22.406    0.139  161.330    0.000
    anx              11.971    0.153   78.194    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .eat1              0.602    0.068    8.905    0.000
   .eat2              0.534    0.053   10.071    0.000
   .eat10             0.332    0.039    8.543    0.000
   .eat11             0.299    0.027   11.227    0.000
   .eat12             0.538    0.058    9.335    0.000
   .eat14             0.234    0.029    8.168    0.000
   .eat24             0.596    0.048   12.509    0.000
   .eat3              0.418    0.043    9.814    0.000
   .eat18             0.450    0.050    9.073    0.000
   .eat21             0.262    0.042    6.199    0.000
   .drive             1.000                           
   .pre               1.000                           
    wsb               3.627    0.252   14.369    0.000
    bmi               7.360    0.497   14.796    0.000
    anx               9.225    0.685   13.475    0.000

Defined Parameters:
                   Estimate  Std.Err  z-value  P(>|z|)
    ab                0.153    0.027    5.563    0.000
parameterEstimates(fimlSem3, boot.ci.type = "bca.simple") %>% 
  select(-(1:3)) %>%
  tail(1)
  • Yes, the indirect effect through Drive for Thinness is statistically significant ( \(ab = 0.153\), \(95\%~CI_{BC} = [0.105; 0.212]\) ).
    • As an aside, it is also interesting to note how similar this CI is to the one produced by MI(Boot).
  • The results suggest a statistically significant, positive indirect effect by which the influence of Western beauty standards on preoccupation with food is transmitted through drive for thinness. This effect implies a process whereby higher levels of Western beauty standards promote a greater drive for thinness which, in turn, contributes to increasing preoccupation with food.

End of Lab 6